# DhakaDrive: A 2D Bounding Box Annotated Dashcam Video Dataset for Unstructured Urban Traffic in Bangladesh

---

## Abstract

**DhakaDrive** is a large-scale, frame-level 2D bounding box annotated dashcam video dataset collected from real-world unstructured urban traffic scenarios in Dhaka, Bangladesh. The dataset comprises high-resolution front dashcam video recordings accompanied by multi-class, multi-level 2D bounding box annotations, capturing the complex and densely populated road scenes characteristic of Dhaka's traffic environment — including local vehicle categories not present in existing western or structured-traffic autonomous driving datasets.

The dataset is intended to support computer vision research tasks including object detection, classification, and tracking, with direct applicability to Autonomous Driving (AD) and Advanced Driver Assistance Systems (ADAS) in the context of the Global South.

---

## Dataset at a Glance

| Property                  | Details                                                    |
| ------------------------- | ---------------------------------------------------------- |
| **Dataset Name**          | DhakaDrive                                                 |
| **Domain**                | Computer Vision / Autonomous Driving / ADAS                |
| **Geography**             | Dhaka, Bangladesh (North to South coverage)                |
| **Recording Period**      | Two weeks (day and night conditions)                       |
| **Total Recordings**      | 173 video clips                                            |
| **Total Duration**        | 6 hours 18 minutes                                         |
| **Frame Rate**            | 30 frames per second (FPS), fixed                          |
| **Resolution**            | 2592 × 1944 pixels                                         |
| **Total Frames**          | ~634,000 frames                                            |
| **Total Bounding Boxes**  | ~2,180,000                                                 |
| **Object Classes**        | 12 distinct vehicle/transportation categories              |
| **Annotation Type**       | Frame-level 2D bounding boxes                              |
| **Annotation Strategies** | Semi-automated, Manual, Automated                          |
| **Privacy Measures**      | Pedestrian faces, vehicle number plates, and GPS blurred   |
| **License**               | Creative Commons Attribution 4.0 International (CC BY 4.0) |

---

## License

This dataset is released under the **Creative Commons Attribution 4.0 International (CC BY 4.0)** license.

[![CC BY 4.0](https://licensebuttons.net/l/by/4.0/88x31.png)](https://creativecommons.org/licenses/by/4.0/)

Under this license, you are free to:

- **Share** — copy and redistribute the material in any medium or format
- **Adapt** — remix, transform, and build upon the material for any purpose, including commercially

Under the following terms:

- **Attribution** — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

**No additional restrictions** — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

This license is consistent with:

- The open access policy of **Data in Brief** (Elsevier), under which the associated data article is published.
- The licensing terms supported by **Harvard Dataverse**, where this dataset is deposited.

Full license text: [https://creativecommons.org/licenses/by/4.0/legalcode](https://creativecommons.org/licenses/by/4.0/legalcode)

See [LICENSE](./LICENSE) for the full terms included with this deposit.

---

## Repository Structure

```
DhakaDrive/
├── previews/                            # Sample annotated preview videos
│   ├── sample_annotated_video.mp4
│
├── videos/                              # Raw high-resolution dashcam video clips
│   ├── train/                           # Training split (121 clips)
│   │   ├── DD_YYYYMMDD_HHMMSS_XXXXXXF.MP4
│   │   └── ...
│   ├── val/                             # Validation split (25 clips)
│   │   ├── DD_YYYYMMDD_HHMMSS_XXXXXXF.MP4
│   │   └── ...
│   └── test/                            # Test split (27 clips)
│       ├── DD_YYYYMMDD_HHMMSS_XXXXXXF.MP4
│       └── ...
│
├── annotations/                         # Frame-level 2D bounding box annotations (XML)
│   ├── semi_automated/                  # Hybrid: model-generated + human-reviewed
│   │   ├── val/                         # 20 annotation files
│   │   │   ├── DD_YYYYMMDD_HHMMSS_XXXXXXF.xml
│   │   │   └── ...
│   │   └── test/                        # 21 annotation files
│   │       ├── DD_YYYYMMDD_HHMMSS_XXXXXXF.xml
│   │       └── ...
│   ├── manual/                          # Fully human-annotated (highest confidence)
│   │   ├── train/                       # 18 annotation files
│   │   │   ├── DD_YYYYMMDD_HHMMSS_XXXXXXF.xml
│   │   │   └── ...
│   │   ├── val/                         # 5 annotation files
│   │   │   ├── DD_YYYYMMDD_HHMMSS_XXXXXXF.xml
│   │   │   └── ...
│   │   └── test/                        # 6 annotation files
│   │       ├── DD_YYYYMMDD_HHMMSS_XXXXXXF.xml
│   │       └── ...
│   └── automated/                       # Fully model-generated, no human review
│       └── train/                       # 103 annotation files
│           ├── DD_YYYYMMDD_HHMMSS_XXXXXXF.xml
│           └── ...
│
└── README.md                            # This file
```

---

## File Descriptions

### `previews/`

Contains two short MP4 clips rendered with visible bounding-box overlays for quick visual inspection of annotation quality and scene characteristics. Preview files are provided for reference only and are **not** part of any train/val/test split.

---

### `videos/`

Contains the full set of 173 raw, high-resolution front dashcam MP4 video clips organized into three standard machine learning splits. Recordings cover Dhaka from North to South, captured over two weeks under both daytime and nighttime conditions at a fixed 30 FPS and a resolution of 2592 × 1944 pixels.

**File naming convention:**

```
DD_YYYYMMDD_HHMMSS_XXXXXXF.MP4
```

| Token      | Description                                      |
| ---------- | ------------------------------------------------ |
| `DD`       | Dataset prefix (DhakaDrive)                      |
| `YYYYMMDD` | Recording date (e.g., `20250115` = Jan 15, 2025) |
| `HHMMSS`   | Recording start time (24-hour format)            |
| `XXXXXXF`  | Unique clip identifier / frame index             |

**Split summary:**

| Split     | Clips   | Conditions    |
| --------- | ------- | ------------- |
| `train/`  | 121     | Day and night |
| `val/`    | 25      | Day and night |
| `test/`   | 27      | Day and night |
| **Total** | **173** |               |

---

### `annotations/`

Each annotation file is an XML document sharing the same base filename as its corresponding video clip, containing frame-level 2D bounding box coordinates for all detected objects across the 12 transportation vehicle categories. Three annotation strategies are provided to support research into annotation quality, label noise, and model robustness.

#### `semi_automated/`

Annotations produced by a hybrid pipeline: an automated model generates initial bounding boxes, which are then reviewed and corrected by human annotators. Semi-automated annotations are provided for the **val** and **test** splits only.

#### `manual/`

Annotations produced entirely by trained human annotators without algorithmic assistance. These represent the highest-confidence ground truth labels in the dataset and are provided for all three splits: **train**, **val**, and **test**.

#### `automated/`

Annotations generated fully by an automated detection model without human review. Provided for the **train** split only and are intended for research into pseudo-labeling, weakly supervised learning, and annotation noise analysis.

**Annotation coverage by strategy and split:**

| Strategy         | Train         | Val          | Test         |
| ---------------- | ------------- | ------------ | ------------ |
| `semi_automated` | —             | ✓ (20 files) | ✓ (21 files) |
| `manual`         | ✓ (18 files)  | ✓ (5 files)  | ✓ (6 files)  |
| `automated`      | ✓ (103 files) | —            | —            |

---

### `README.md`

This file. Provides dataset overview, structure documentation, annotation format, intended use, citation information, and licensing terms.

---

## Annotation Format

Annotations follow a PASCAL VOC–style XML structure providing frame-level 2D bounding box coordinates. A representative example:

```xml
<annotation>
  <filename>DD_YYYYMMDD_HHMMSS_XXXXXXF.MP4</filename>
  <size>
    <width>2592</width>
    <height>1944</height>
    <depth>[channels]</depth>
  </size>
  <object>
    <name>[vehicle_class]</name>
    <bndbox>
      <xmin>[x1]</xmin>
      <ymin>[y1]</ymin>
      <xmax>[x2]</xmax>
      <ymax>[y2]</ymax>
    </bndbox>
  </object>
</annotation>
```

> **Note:** Verify the exact schema against representative annotation files in the dataset before use, as additional fields may be present depending on annotation level.

---

## Object Classes

The dataset provides ground truth annotations for **12 distinct object classes**, representing transportation vehicles currently in use in Bangladesh. These include both conventional vehicle types and locally prevalent categories not found in standard western autonomous driving benchmarks.

> [List the 12 classes explicitly here once confirmed, e.g., rickshaw, CNG auto-rickshaw, bus, truck, motorcycle, car, bicycle, etc.]

---

## Data Collection

Video recordings were captured using a front-mounted dashcam on a vehicle traversing Dhaka city from North to South over a two-week period. The recordings encompass a wide range of traffic conditions, road types, lighting environments (day and night), and traffic densities representative of Dhaka's unstructured urban road scenes.

---

## Privacy and Ethics

To protect the privacy of individuals present in the recordings:

- **Pedestrian faces** are blurred across all frames.
- **Vehicle number plates** are blurred across all frames.
- **GPS coordinates** are removed from all video metadata.

All data was collected in public spaces in accordance with applicable local regulations. Users of this dataset are responsible for compliance with applicable privacy regulations in their jurisdiction.

---

## Intended Use

This dataset is intended for:

- Object detection and classification in unstructured South Asian urban traffic environments
- Training and evaluating autonomous driving and ADAS models for Global South road conditions
- Benchmarking detection performance on locally specific vehicle morphologies not covered by existing datasets
- Studying the impact of annotation strategy (manual vs. semi-automated vs. automated) on model performance
- Research in domain adaptation between structured (western) and unstructured traffic environments
- Video-based multi-class detection under real-world occlusion, high density, and mixed traffic scenarios

---

## Citation

If you use this dataset in your research, please cite the associated Data in Brief article:

```bibtex
@article{dhakadrive2025,
  title     = {DhakaDrive: A Large-Scale Front Dashcam Video Dataset with Multi-Class and Multi-Level 2D Annotations for Unstructured Urban Traffic in Bangladesh},
  author    = {[Author(s)]},
  journal   = {Data in Brief},
  year      = {[Year]},
  volume    = {[Volume]},
  pages     = {[Pages]},
  doi       = {[DOI]}
}
```

---

## Contact

For questions regarding this dataset, please contact:

**[Principal Investigator / Corresponding Author]**
[Affiliation]
[Email Address]

---

## Acknowledgements

[Funding body, institution, or individuals to acknowledge.]

---

_Deposited on Harvard Dataverse under CC BY 4.0. Published in Data in Brief (Elsevier)._
